Overview

Dataset statistics

Number of variables31
Number of observations9348
Missing cells357
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.2 MiB
Average record size in memory248.0 B

Variable types

BOOL17
NUM14

Reproduction

Analysis started2020-06-14 20:12:33.873291
Analysis finished2020-06-14 20:13:19.344613
Duration45.47 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

bodyCharCt is highly correlated with numLinesHigh correlation
numLines is highly correlated with bodyCharCtHigh correlation
numRec has 282 (3.0%) missing values Missing
subExcCt is highly skewed (γ1 = 30.02121095) Skewed
numAtt is highly skewed (γ1 = 21.07630037) Skewed
numRec is highly skewed (γ1 = 27.10968379) Skewed
numDlr is highly skewed (γ1 = 59.26093156) Skewed
Unnamed: 0 has unique values Unique
subExcCt has 8498 (90.9%) zeros Zeros
subQuesCt has 8336 (89.2%) zeros Zeros
numAtt has 8782 (93.9%) zeros Zeros
hour has 268 (2.9%) zeros Zeros
perHTML has 8204 (87.8%) zeros Zeros
subBlanks has 233 (2.5%) zeros Zeros
forwards has 5592 (59.8%) zeros Zeros
numDlr has 7578 (81.1%) zeros Zeros

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct count9348
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4673.5
Minimum0
Maximum9347
Zeros1
Zeros (%)< 0.1%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile467.35
Q12336.75
median4673.5
Q37010.25
95-th percentile8879.65
Maximum9347
Range9347
Interquartile range (IQR)4673.5

Descriptive statistics

Standard deviation2698.679492
Coefficient of variation (CV)0.5774429211
Kurtosis-1.2
Mean4673.5
Median Absolute Deviation (MAD)2337
Skewness0
Sum43687878
Variance7282871
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
32911< 0.1%
 
33231< 0.1%
 
12741< 0.1%
 
74171< 0.1%
 
53681< 0.1%
 
33151< 0.1%
 
12661< 0.1%
 
74091< 0.1%
 
53601< 0.1%
 
33071< 0.1%
 
12581< 0.1%
 
74011< 0.1%
 
53521< 0.1%
 
32991< 0.1%
 
12501< 0.1%
 
73931< 0.1%
 
53761< 0.1%
 
74251< 0.1%
 
12821< 0.1%
 
33471< 0.1%
 
74571< 0.1%
 
54081< 0.1%
 
33551< 0.1%
 
13061< 0.1%
 
Other values (9323)932399.7%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
93471< 0.1%
 
93461< 0.1%
 
93451< 0.1%
 
93441< 0.1%
 
93431< 0.1%
 
93421< 0.1%
 
93411< 0.1%
 
93401< 0.1%
 
93391< 0.1%
 
93381< 0.1%
 

isSpam
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
6951
1
2397
ValueCountFrequency (%) 
0695174.4%
 
1239725.6%
 

isRe
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
6343
1
3005
ValueCountFrequency (%) 
0634367.9%
 
1300532.1%
 

underscore
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9222
1
 
126
ValueCountFrequency (%) 
0922298.7%
 
11261.3%
 

priority
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9294
1
 
54
ValueCountFrequency (%) 
0929499.4%
 
1540.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
6556
1
2792
ValueCountFrequency (%) 
0655670.1%
 
1279229.9%
 

sortedRec
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
1
8400
0
 
948
ValueCountFrequency (%) 
1840089.9%
 
094810.1%
 

subPunc
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9085
1
 
263
ValueCountFrequency (%) 
0908597.2%
 
12632.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9020
1
 
328
ValueCountFrequency (%) 
0902096.5%
 
13283.5%
 

hasImages
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9326
1
 
22
ValueCountFrequency (%) 
0932699.8%
 
1220.2%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9172
1
 
176
ValueCountFrequency (%) 
0917298.1%
 
11761.9%
 
Distinct count2
Unique (%)< 0.1%
Missing7
Missing (%)0.1%
Memory size73.2 KiB
0
8697
1
 
644
(Missing)
 
7
ValueCountFrequency (%) 
0869793.0%
 
16446.9%
 
(Missing)70.1%
 

noHost
Boolean

Distinct count2
Unique (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size73.2 KiB
0
9318
1
 
29
(Missing)
 
1
ValueCountFrequency (%) 
0931899.7%
 
1290.3%
 
(Missing)1< 0.1%
 

numEnd
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
8209
1
 
1139
ValueCountFrequency (%) 
0820987.8%
 
1113912.2%
 

isYelling
Boolean

Distinct count2
Unique (%)< 0.1%
Missing7
Missing (%)0.1%
Memory size73.2 KiB
0
9134
1
 
207
(Missing)
 
7
ValueCountFrequency (%) 
0913497.7%
 
12072.2%
 
(Missing)70.1%
 

isOrigMsg
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
8988
1
 
360
ValueCountFrequency (%) 
0898896.1%
 
13603.9%
 

isDear
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
9270
1
 
78
ValueCountFrequency (%) 
0927099.2%
 
1780.8%
 

isWrote
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.2 KiB
0
7442
1
1906
ValueCountFrequency (%) 
0744279.6%
 
1190620.4%
 

numLines
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count457
Unique (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66.90853658536585
Minimum2
Maximum6319
Zeros0
Zeros (%)0.0%
Memory size73.2 KiB

Quantile statistics

Minimum2
5-th percentile7
Q119
median32
Q359
95-th percentile258
Maximum6319
Range6317
Interquartile range (IQR)40

Descriptive statistics

Standard deviation147.9558858
Coefficient of variation (CV)2.211315526
Kurtosis706.0795696
Mean66.90853659
Median Absolute Deviation (MAD)17
Skewness19.1572773
Sum625461
Variance21890.94413
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
73243.5%
 
62342.5%
 
242312.5%
 
82102.2%
 
271992.1%
 
301922.1%
 
231801.9%
 
321801.9%
 
211791.9%
 
281771.9%
 
251711.8%
 
311661.8%
 
291651.8%
 
261651.8%
 
111651.8%
 
221651.8%
 
161631.7%
 
341631.7%
 
171621.7%
 
131611.7%
 
151611.7%
 
191581.7%
 
331531.6%
 
201471.6%
 
181381.5%
 
Other values (432)483951.8%
 
ValueCountFrequency (%) 
2100.1%
 
34< 0.1%
 
4130.1%
 
5160.2%
 
62342.5%
 
73243.5%
 
82102.2%
 
91101.2%
 
101091.2%
 
111651.8%
 
ValueCountFrequency (%) 
63192< 0.1%
 
25231< 0.1%
 
19421< 0.1%
 
16992< 0.1%
 
16942< 0.1%
 
16842< 0.1%
 
12201< 0.1%
 
11971< 0.1%
 
11162< 0.1%
 
11122< 0.1%
 

bodyCharCt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3236
Unique (%)34.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2844.0914634146343
Minimum6
Maximum188505
Zeros0
Zeros (%)0.0%
Memory size73.2 KiB

Quantile statistics

Minimum6
5-th percentile192
Q1587
median1088.5
Q32192
95-th percentile11508.3
Maximum188505
Range188499
Interquartile range (IQR)1605

Descriptive statistics

Standard deviation6711.335668
Coefficient of variation (CV)2.359746778
Kurtosis177.0952111
Mean2844.091463
Median Absolute Deviation (MAD)628.5
Skewness9.65440469
Sum26586567
Variance45042026.45
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
572230.2%
 
189210.2%
 
151200.2%
 
156170.2%
 
574170.2%
 
201170.2%
 
810160.2%
 
595150.2%
 
200150.2%
 
906150.2%
 
393150.2%
 
99140.1%
 
250140.1%
 
198140.1%
 
815140.1%
 
187140.1%
 
525140.1%
 
1027140.1%
 
1140140.1%
 
542130.1%
 
386130.1%
 
1109130.1%
 
193130.1%
 
1081130.1%
 
487130.1%
 
Other values (3211)896795.9%
 
ValueCountFrequency (%) 
62< 0.1%
 
271< 0.1%
 
392< 0.1%
 
441< 0.1%
 
461< 0.1%
 
512< 0.1%
 
522< 0.1%
 
572< 0.1%
 
601< 0.1%
 
624< 0.1%
 
ValueCountFrequency (%) 
1885052< 0.1%
 
1246412< 0.1%
 
1064891< 0.1%
 
868751< 0.1%
 
863361< 0.1%
 
863251< 0.1%
 
840172< 0.1%
 
717551< 0.1%
 
714472< 0.1%
 
679981< 0.1%
 

subExcCt
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count8
Unique (%)0.1%
Missing20
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean0.13132504288164665
Minimum0.0
Maximum42.0
Zeros8498
Zeros (%)90.9%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum42
Range42
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6615645999
Coefficient of variation (CV)5.037611908
Kurtosis1740.670382
Mean0.1313250429
Median Absolute Deviation (MAD)0
Skewness30.02121095
Sum1225
Variance0.4376677198
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0849890.9%
 
16246.7%
 
21221.3%
 
3540.6%
 
4130.1%
 
590.1%
 
870.1%
 
421< 0.1%
 
(Missing)200.2%
 
ValueCountFrequency (%) 
0849890.9%
 
16246.7%
 
21221.3%
 
3540.6%
 
4130.1%
 
590.1%
 
870.1%
 
421< 0.1%
 
ValueCountFrequency (%) 
421< 0.1%
 
870.1%
 
590.1%
 
4130.1%
 
3540.6%
 
21221.3%
 
16246.7%
 
0849890.9%
 

subQuesCt
Real number (ℝ≥0)

ZEROS

Distinct count8
Unique (%)0.1%
Missing20
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean0.13775728987993138
Minimum0.0
Maximum12.0
Zeros8336
Zeros (%)89.2%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum12
Range12
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5076853212
Coefficient of variation (CV)3.685360838
Kurtosis99.23061573
Mean0.1377572899
Median Absolute Deviation (MAD)0
Skewness7.437372476
Sum1285
Variance0.2577443854
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0833689.2%
 
18709.3%
 
4610.7%
 
2420.4%
 
3140.1%
 
122< 0.1%
 
82< 0.1%
 
51< 0.1%
 
(Missing)200.2%
 
ValueCountFrequency (%) 
0833689.2%
 
18709.3%
 
2420.4%
 
3140.1%
 
4610.7%
 
51< 0.1%
 
82< 0.1%
 
122< 0.1%
 
ValueCountFrequency (%) 
122< 0.1%
 
82< 0.1%
 
51< 0.1%
 
4610.7%
 
3140.1%
 
2420.4%
 
18709.3%
 
0833689.2%
 

numAtt
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count6
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06578947368421052
Minimum0.0
Maximum18.0
Zeros8782
Zeros (%)93.9%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum18
Range18
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3248786054
Coefficient of variation (CV)4.938154802
Kurtosis1016.808563
Mean0.06578947368
Median Absolute Deviation (MAD)0
Skewness21.07630037
Sum615
Variance0.1055461082
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0878293.9%
 
15445.8%
 
2170.2%
 
53< 0.1%
 
181< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
0878293.9%
 
15445.8%
 
2170.2%
 
41< 0.1%
 
53< 0.1%
 
181< 0.1%
 
ValueCountFrequency (%) 
181< 0.1%
 
53< 0.1%
 
41< 0.1%
 
2170.2%
 
15445.8%
 
0878293.9%
 

numRec
Real number (ℝ≥0)

MISSING
SKEWED

Distinct count51
Unique (%)0.6%
Missing282
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean1.929406574012795
Minimum0.0
Maximum311.0
Zeros92
Zeros (%)1.0%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile4
Maximum311
Range311
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5.242396014
Coefficient of variation (CV)2.717102805
Kurtosis1371.669611
Mean1.929406574
Median Absolute Deviation (MAD)0
Skewness27.10968379
Sum17492
Variance27.48271596
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1680272.8%
 
2126913.6%
 
33453.7%
 
41401.5%
 
0921.0%
 
5730.8%
 
10480.5%
 
7350.4%
 
8340.4%
 
11300.3%
 
12240.3%
 
6240.3%
 
9170.2%
 
19140.1%
 
14130.1%
 
4480.1%
 
1570.1%
 
1670.1%
 
4870.1%
 
1860.1%
 
2150.1%
 
4550.1%
 
1350.1%
 
4650.1%
 
324< 0.1%
 
Other values (26)470.5%
 
(Missing)2823.0%
 
ValueCountFrequency (%) 
0921.0%
 
1680272.8%
 
2126913.6%
 
33453.7%
 
41401.5%
 
5730.8%
 
6240.3%
 
7350.4%
 
8340.4%
 
9170.2%
 
ValueCountFrequency (%) 
3111< 0.1%
 
751< 0.1%
 
741< 0.1%
 
681< 0.1%
 
662< 0.1%
 
542< 0.1%
 
493< 0.1%
 
4870.1%
 
473< 0.1%
 
4650.1%
 

perCaps
Real number (ℝ≥0)

Distinct count5201
Unique (%)55.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.850370586877977
Minimum0.0
Maximum100.0
Zeros9
Zeros (%)0.1%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile2.504472272
Q14.255319149
median6.055473246
Q39.059398644
95-th percentile26.46754047
Maximum100
Range100
Interquartile range (IQR)4.804079495

Descriptive statistics

Standard deviation9.58341544
Coefficient of variation (CV)1.082826459
Kurtosis21.6732258
Mean8.850370587
Median Absolute Deviation (MAD)2.174075947
Skewness4.014130081
Sum82733.26425
Variance91.8418515
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6.666666667240.3%
 
5.982905983220.2%
 
11.11111111220.2%
 
4.385964912200.2%
 
7.142857143200.2%
 
5.882352941190.2%
 
7.692307692150.2%
 
10.41666667150.2%
 
12.5140.1%
 
4.225352113140.1%
 
5.555555556130.1%
 
4130.1%
 
6.060606061120.1%
 
14.98289624120.1%
 
3.571428571120.1%
 
4.347826087110.1%
 
6.818181818110.1%
 
3.846153846110.1%
 
5110.1%
 
4.545454545110.1%
 
5.263157895100.1%
 
5.333333333100.1%
 
12.04819277100.1%
 
6.303724928100.1%
 
4.62633452100.1%
 
Other values (5176)899696.2%
 
ValueCountFrequency (%) 
090.1%
 
0.38910505841< 0.1%
 
0.47846889951< 0.1%
 
0.54249547921< 0.1%
 
0.55452865061< 0.1%
 
0.71258907361< 0.1%
 
0.79365079372< 0.1%
 
0.80385852092< 0.1%
 
0.83333333331< 0.1%
 
0.84033613452< 0.1%
 
ValueCountFrequency (%) 
10050.1%
 
99.153960581< 0.1%
 
98.022834861< 0.1%
 
97.94101281< 0.1%
 
97.567156611< 0.1%
 
96.541786742< 0.1%
 
96.279069771< 0.1%
 
95.601938131< 0.1%
 
93.870082341< 0.1%
 
85.736677121< 0.1%
 

hour
Real number (ℝ≥0)

ZEROS

Distinct count24
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.210847240051347
Minimum0.0
Maximum23.0
Zeros268
Zeros (%)2.9%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q18
median13
Q318
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.623932056
Coefficient of variation (CV)0.5424629369
Kurtosis-1.092614847
Mean12.21084724
Median Absolute Deviation (MAD)5
Skewness-0.09694558009
Sum114147
Variance43.87647588
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8138614.8%
 
155896.3%
 
184464.8%
 
194374.7%
 
164234.5%
 
224124.4%
 
204094.4%
 
234074.4%
 
24064.3%
 
214054.3%
 
143994.3%
 
173934.2%
 
133714.0%
 
113203.4%
 
103113.3%
 
93053.3%
 
12803.0%
 
32702.9%
 
02682.9%
 
122432.6%
 
42362.5%
 
52222.4%
 
72182.3%
 
61922.1%
 
ValueCountFrequency (%) 
02682.9%
 
12803.0%
 
24064.3%
 
32702.9%
 
42362.5%
 
52222.4%
 
61922.1%
 
72182.3%
 
8138614.8%
 
93053.3%
 
ValueCountFrequency (%) 
234074.4%
 
224124.4%
 
214054.3%
 
204094.4%
 
194374.7%
 
184464.8%
 
173934.2%
 
164234.5%
 
155896.3%
 
143994.3%
 

perHTML
Real number (ℝ≥0)

ZEROS

Distinct count885
Unique (%)9.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.5170821409935185
Minimum0.0
Maximum100.0
Zeros8204
Zeros (%)87.8%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile59.57621835
Maximum100
Range100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation19.13526602
Coefficient of variation (CV)2.936170759
Kurtosis7.927976454
Mean6.517082141
Median Absolute Deviation (MAD)0
Skewness2.990762235
Sum60921.68385
Variance366.1584056
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0820487.8%
 
86.3967437980.1%
 
30.230326360.1%
 
23.6789090350.1%
 
35.0918327550.1%
 
23.7569060850.1%
 
32.670867674< 0.1%
 
47.55260754< 0.1%
 
35.965266564< 0.1%
 
41.743923834< 0.1%
 
58.34970534< 0.1%
 
63.161625714< 0.1%
 
45.365005794< 0.1%
 
62.186559684< 0.1%
 
48.767945154< 0.1%
 
66.729205754< 0.1%
 
32.244094494< 0.1%
 
17.972681524< 0.1%
 
50.174590384< 0.1%
 
1004< 0.1%
 
36.070320584< 0.1%
 
13.146624973< 0.1%
 
28.931127183< 0.1%
 
24.067796613< 0.1%
 
64.949494953< 0.1%
 
Other values (860)104311.2%
 
ValueCountFrequency (%) 
0820487.8%
 
7.8528827042< 0.1%
 
8.4745762712< 0.1%
 
8.9204545451< 0.1%
 
10.381543921< 0.1%
 
10.899873261< 0.1%
 
11.058071091< 0.1%
 
11.288805271< 0.1%
 
11.294117651< 0.1%
 
11.840652341< 0.1%
 
ValueCountFrequency (%) 
1004< 0.1%
 
98.830409361< 0.1%
 
98.244781781< 0.1%
 
97.57433491< 0.1%
 
97.343751< 0.1%
 
96.160373361< 0.1%
 
96.057499211< 0.1%
 
94.055944061< 0.1%
 
93.950808772< 0.1%
 
93.815822731< 0.1%
 

subBlanks
Real number (ℝ≥0)

ZEROS

Distinct count546
Unique (%)5.9%
Missing20
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean13.866939237078649
Minimum0.0
Maximum86.41975308641977
Zeros233
Zeros (%)2.5%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile5.555555556
Q110.52631579
median13.25301205
Q315.68627451
95-th percentile21.42857143
Maximum86.41975309
Range86.41975309
Interquartile range (IQR)5.15995872

Descriptive statistics

Standard deviation7.431937546
Coefficient of variation (CV)0.5359464997
Kurtosis18.46340608
Mean13.86693924
Median Absolute Deviation (MAD)2.538726334
Skewness3.335897406
Sum129350.8092
Variance55.23369569
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
14.285714294685.0%
 
12.53984.3%
 
102983.2%
 
16.666666672532.7%
 
11.111111112512.7%
 
9.0909090912472.6%
 
02332.5%
 
15.384615382082.2%
 
13.333333331902.0%
 
11.764705881611.7%
 
13.636363641511.6%
 
12.121212121501.6%
 
17.647058821181.3%
 
15.789473681101.2%
 
18.181818181071.1%
 
151051.1%
 
13.793103451041.1%
 
14.705882351031.1%
 
121021.1%
 
14.814814811021.1%
 
8.3333333331011.1%
 
10.526315791001.1%
 
6.6666666671001.1%
 
11.53846154971.0%
 
9.523809524951.0%
 
Other values (521)497653.2%
 
ValueCountFrequency (%) 
02332.5%
 
1.3333333332< 0.1%
 
1.7857142862< 0.1%
 
2.439024394< 0.1%
 
2.8571428573< 0.1%
 
2.8985507251< 0.1%
 
3.030303032< 0.1%
 
3.2258064523< 0.1%
 
3.3333333331< 0.1%
 
3.3707865172< 0.1%
 
ValueCountFrequency (%) 
86.419753091< 0.1%
 
84.615384621< 0.1%
 
84.422110551< 0.1%
 
77.61194031< 0.1%
 
74.285714291< 0.1%
 
71.929824561< 0.1%
 
70.689655172< 0.1%
 
69.135802472< 0.1%
 
65.62570.1%
 
65.517241381< 0.1%
 

forwards
Real number (ℝ≥0)

ZEROS

Distinct count853
Unique (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.445085866326238
Minimum0.0
Maximum99.0582695703355
Zeros5592
Zeros (%)59.8%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q315.38461538
95-th percentile52
Maximum99.05826957
Range99.05826957
Interquartile range (IQR)15.38461538

Descriptive statistics

Standard deviation18.26357585
Coefficient of variation (CV)1.748532858
Kurtosis3.596424676
Mean10.44508587
Median Absolute Deviation (MAD)0
Skewness1.997014098
Sum97640.66268
Variance333.5582027
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0559259.8%
 
20600.6%
 
33.33333333580.6%
 
25570.6%
 
50520.6%
 
12.5480.5%
 
14.28571429480.5%
 
11.11111111400.4%
 
16.66666667360.4%
 
13.33333333340.4%
 
40320.3%
 
10310.3%
 
9.090909091310.3%
 
22.22222222290.3%
 
30280.3%
 
18.18181818270.3%
 
28.57142857260.3%
 
17.64705882250.3%
 
15250.3%
 
18.75240.3%
 
27.27272727240.3%
 
5240.3%
 
8.333333333240.3%
 
13.04347826240.3%
 
6.666666667230.2%
 
Other values (828)292631.3%
 
ValueCountFrequency (%) 
0559259.8%
 
0.015825288812< 0.1%
 
0.1466275661< 0.1%
 
0.17730496452< 0.1%
 
0.18281535652< 0.1%
 
0.19569471622< 0.1%
 
0.21786492371< 0.1%
 
0.24067388691< 0.1%
 
0.26246719162< 0.1%
 
0.26455026461< 0.1%
 
ValueCountFrequency (%) 
99.058269572< 0.1%
 
98.364008183< 0.1%
 
98.051948052< 0.1%
 
95.402298851< 0.1%
 
94.273127751< 0.1%
 
93.893129774< 0.1%
 
92.616033762< 0.1%
 
91.826923081< 0.1%
 
91.752577322< 0.1%
 
91.735537191< 0.1%
 

avgWordLen
Real number (ℝ≥0)

Distinct count5020
Unique (%)53.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.487221791442675
Minimum1.3630718540108986
Maximum26.0
Zeros0
Zeros (%)0.0%
Memory size73.2 KiB

Quantile statistics

Minimum1.363071854
5-th percentile3.822419128
Q14.208256552
median4.454545455
Q34.728506787
95-th percentile5.225085079
Maximum26
Range24.63692815
Interquartile range (IQR)0.5202502353

Descriptive statistics

Standard deviation0.568582053
Coefficient of variation (CV)0.1267113772
Kurtosis227.7112775
Mean4.487221791
Median Absolute Deviation (MAD)0.2590357305
Skewness6.643624729
Sum41946.54931
Variance0.323285551
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5670.7%
 
4.5490.5%
 
4360.4%
 
4.6240.3%
 
4.875240.3%
 
4.166666667240.3%
 
4.833333333230.2%
 
5.181818182210.2%
 
4.333333333210.2%
 
4.2210.2%
 
4.8170.2%
 
4.611111111160.2%
 
4.3160.2%
 
4.75160.2%
 
4.818181818150.2%
 
4.1140.1%
 
4.615384615140.1%
 
4.25130.1%
 
4.285714286130.1%
 
4.4130.1%
 
4.666666667130.1%
 
4.291666667120.1%
 
4.235294118120.1%
 
4.929824561120.1%
 
5.333333333120.1%
 
Other values (4995)883094.5%
 
ValueCountFrequency (%) 
1.3630718541< 0.1%
 
1.3758620691< 0.1%
 
1.3775392871< 0.1%
 
1.3959731542< 0.1%
 
1.4067444462< 0.1%
 
1.46315789560.1%
 
1.52< 0.1%
 
1.51252< 0.1%
 
1.7199347742< 0.1%
 
1.78337642< 0.1%
 
ValueCountFrequency (%) 
261< 0.1%
 
9.4338919931< 0.1%
 
9.0384615382< 0.1%
 
9.0372670811< 0.1%
 
8.7473684212< 0.1%
 
8.1943661972< 0.1%
 
8.181347152< 0.1%
 
8.1676300582< 0.1%
 
8.1629116122< 0.1%
 
8.154078552< 0.1%
 

numDlr
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count56
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.7815575524176295
Minimum0
Maximum1977
Zeros7578
Zeros (%)81.1%
Memory size73.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5
Maximum1977
Range1977
Interquartile range (IQR)0

Descriptive statistics

Standard deviation30.3804554
Coefficient of variation (CV)17.05274991
Kurtosis3825.174072
Mean1.781557552
Median Absolute Deviation (MAD)0
Skewness59.26093156
Sum16654
Variance922.9720702
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0757881.1%
 
15686.1%
 
24094.4%
 
32022.2%
 
41201.3%
 
5760.8%
 
6670.7%
 
7380.4%
 
8260.3%
 
14250.3%
 
10240.3%
 
16200.2%
 
12200.2%
 
13200.2%
 
9180.2%
 
19160.2%
 
17110.1%
 
11100.1%
 
15970.1%
 
16260.1%
 
4550.1%
 
1550.1%
 
2750.1%
 
344< 0.1%
 
234< 0.1%
 
Other values (31)640.7%
 
ValueCountFrequency (%) 
0757881.1%
 
15686.1%
 
24094.4%
 
32022.2%
 
41201.3%
 
5760.8%
 
6670.7%
 
7380.4%
 
8260.3%
 
9180.2%
 
ValueCountFrequency (%) 
19772< 0.1%
 
2182< 0.1%
 
1812< 0.1%
 
1801< 0.1%
 
1761< 0.1%
 
16260.1%
 
15970.1%
 
1482< 0.1%
 
1441< 0.1%
 
1382< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

Unnamed: 0isSpamisReunderscorepriorityisInReplyTosortedRecsubPuncmultipartTexthasImagesisPGPsignedsubSpamWordsnoHostnumEndisYellingisOrigMsgisDearisWrotenumLinesbodyCharCtsubExcCtsubQuesCtnumAttnumRecperCapshourperHTMLsubBlanksforwardsavgWordLennumDlr
0001001100000.00.000.00005015540.00.00.02.04.45103911.00.012.5000000.0000004.3766233
1100000100000.00.000.0000268730.00.00.01.07.49128911.00.08.0000000.0000004.5555560
2200000100000.00.000.00003817130.00.00.01.07.43609612.00.08.0000000.0000004.8171640
3300000100000.00.000.00003210950.00.00.00.05.09090913.00.018.9189193.1250004.7142860
4401000100000.00.000.00003110210.00.00.01.06.11664313.00.015.2173916.4516134.2349400
5501001100000.00.000.0000257180.00.00.01.07.62527213.00.015.21739112.0000003.9568970
6600000100000.00.000.00003812880.00.00.01.06.34371413.00.017.0212770.0000004.0514020
7701000100000.00.000.00013911820.00.00.01.06.61764714.00.015.21739112.8205134.0396040
8800000100000.00.000.000012659890.00.00.01.03.16136114.00.06.2500000.0000004.4052220
9901001100000.00.000.00005015540.00.00.02.04.45103911.00.012.5000000.0000004.3766233

Last rows

Unnamed: 0isSpamisReunderscorepriorityisInReplyTosortedRecsubPuncmultipartTexthasImagesisPGPsignedsubSpamWordsnoHostnumEndisYellingisOrigMsgisDearisWrotenumLinesbodyCharCtsubExcCtsubQuesCtnumAttnumRecperCapshourperHTMLsubBlanksforwardsavgWordLennumDlr
9338933810000100000.00.010.0000196108100.00.00.01.04.62203018.083.95591814.2857140.0000004.3679255
9339933910000100000.00.000.00007233881.00.00.01.048.85662412.00.00000010.0000000.0000007.04603645
9340934010000100000.00.000.000012342580.00.00.01.018.22897416.060.00517112.9032260.0000004.2909380
9341934110000000000.00.010.000012159450.00.00.01.025.32154319.037.3233753.7190080.8264464.1512792
9342934210000100001.00.010.0000154610.00.00.01.02.86885221.00.00000014.2857140.0000004.6923080
9343934310000100000.00.000.0000245148411.01.00.01.08.57255221.079.65711113.7931030.0000004.7005552
9344934410000101000.00.000.00005812880.00.01.01.09.43600923.00.00000010.5263160.0000004.9042554
9345934510000000001.00.000.00005324200.00.00.01.02.4184488.00.00000020.0000000.0000004.7037040
9346934610000100000.00.000.0000279235020.00.00.01.07.79540023.00.0000005.2631580.0000005.25269080
9347934710000100000.00.011.00003525160.00.00.01.04.7839516.00.00000014.2857140.0000004.8238211